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- ■ ■ THEORETIC CRITERIA 
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^^ ' Image segmentation is a long-studied and important problem in 

image processing. Different solutions have been proposed, many of 
which follow the information theoretic paradigm. While these infor- 
mation theoretic segmentation methods often produce excellent em- 
pirical results, their theoretical properties are still largely unknown. 

t-H ■ The main goal of this paper is to conduct a rigorous theoretical study 

{/*} ' into the statistical consistency properties of such methods. To be 

more specific, this paper investigates if these methods can accurately 
recover the true number of segments together with their true bound- 
aries in the image as the number of pixels tends to infinity. Our 
theoretical results show that both the Bayesian information criterion 
(BIC) and the minimum description length (MDL) principle can be 
applied to derive statistically consistent segmentation methods, while 
the same is not true for the Akaike information criterion (AIC). Nu- 

J^ merical experiments were conducted to illustrate and support our 

-y. , theoretical findings. 

o 

1. Introduction. Image segmentation aims to partition an image into 
a set of nonoverlapping regions so that pixels within the same region are 
homogeneous with respect to some characteristic (e.g., gray value or rough- 
ness), while pixels from adjacent regions are significantly different with 
respect to the same characteristic. It is a fundamental problem in image 
processing, as very often it is necessary to first group the highly localized 
pixels into more global and meaningful segmented objects to facilitate the 
C3 , extraction of useful information. In this paper, gray value is the image char- 

acteristic that forms the basis for segmentation. For general introductions 
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to image segmentation, see, for example, Glasbey and Horgan (1995) and 
Haralick and Shapiro (1992). 

A grayscale image can be seen as a two-dimensional (2D) surface living in 
a three-dimensional space. Therefore one popular approach to segmenting 
it is to model it by a 2D piecewise constant function, with the set of all 
discontinuity points defining the region boundaries of the image. Examples 
of segmentation methods that follow this approach include Kanungo et al. 
(1995), LaValle and Hutchinson (1995), Leclerc (1989), Lee (1998, 2000), 
Luo and Khoshgoftaar (2006) and Wang, Ju and Wang (2009). As to be 
demonstrated below, segmenting images with this approach can be recast as 
a model selection problem, and one crucial issue to its success is the choice of 
the model complexity, which is equaivalent to choosing the number of regions 
together with the shapes of their boundaries. Common information theoretic 
methods such as the Akaike information criterion (AIC) [Akaike (1974)], the 
Bayesian information criterion (BIC), also known as the Schwarz informa- 
tion criterion [Schwarz (1978)] and the minimum description length (MDL) 
principle [Rissanen (1989, 2007)] have been adopted to solve this problem; 
for example, see Kanungo et al. (1995), Leclerc (1989), Lee (1998, 2000), 
Luo and Khoshgoftaar (2006), Murtagh, Raftery and Starck (2005), Stan- 
ford and Raftery (2002), Zhang and Modestino (1990) and Zhu and Yuille 
(1996). While many of these methods produce excellent practical results, 
their theoretical properties are still largely unknown. The goal of this paper 
is to conduct a systematic study on the theoretical properties of these meth- 
ods, with the hope of enhancing our understanding of their performances, 
at both theoretical and empirical levels. To the best of our knowledge, this 
is the first time that such a rigorous theoretical study is being performed 
for image segmentation methods. 

The rest of this paper is organized as follows. Background material is pre- 
sented in Section 2. Section 3 presents our main theoretical results. These 
theoretical results are empirically verified by numerical experiments in Sec- 
tion 4. Concluding remarks are offered in Section 6, while technical details 
are delayed to the Appendix. 

2. Background. Denote by / the true image and E n = {xi, . . . ,x n } the 
set of n grid points at which a noisy version of / is sampled. Without loss of 
generality it is assumed that the domain of / is [0, l] 2 . As mentioned before, / 
is modeled as a 2D piecewise constant function as follows. Write fi = f(xi) 
and f = (/i, . . . , /„)'. Let the number of regions (or pieces or segments) in / 
be m, and denote the gray value and domain of the i/th region as /j, u and R u , 
respectively. Then we have, for i = 1, . . . , n, 

(1) fi = [i v if Xi € R u , 
in 

(2) (J^ = [0,1] 2 and R v nR u > = xiv^v'. 
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In the sequel we write R = (R\, ■ . ■ ,R m ) and n = (fi\, . . . , /j m )'. Thus R 
defines a segmentation of /. The observed noisy version y = (yi, . . . ,y n )' 
of f is modeled as 

(3) Vi = fi + £i, i = l,...,n, 

where the noise £j's are independent, identically distributed random vari- 
ables with zero mean and variance a 2 . Given y, the goal is then to estimate f , 
which is equivalent to estimating m, R and /x. 

For simplicity, denote by 8 m = (m, R, fj,)' a generic parameter vector. 
Estimating f is hence equivalent to the model selection problem in which 
each model is determined by the parameter 6 m . Let RSS m = YliiVi ~ fi) 2 be 
the corresponding residual sum of squares. Notice that different values of m 
would lead to a different number of parameters in 6 m . Also notice that 6 m 
cannot be estimated by minimizing RSS m , as RSS m can be made arbitrarily 
small as m tends to n. One way to resolve this issue is to add a penalty term 
to RSS m to suitably penalize the complexity of 6 m . As alluded to before, 
information theoretic model selection methods like AIC, BIC and MDL can 
be used to derive such a penalty. We first focus on the MDL criterion derived 
by Lee (2000), 



(4) 



MDL(m,R)=mlnn+— 2jb,, + -2j]na I , + -ln > 

sLi — _ ■ \ // I 



where each region R u enters through its "area" a u (in terms of number 
of pixels) and "perimeter" b u (in terms of number of pixel edges). These 
quantities are formally defined as 

a v = #(H n fl R u ) and b u = #(E n n dR u ) 

with ^A and dA indicating, respectively, cardinality and boundary of the 
set A. Observe that, once the estimates rh and R are specified, // can be 
uniquely estimated by 

(5) \x v = — y~] yi for all v, 

and therefore fi is dropped in the argument list of MDL(m, R). To sum 
up, the MDL-based method of Lee (2000) estimates m and R as the joint 
minimizer of (4), which is equivalent to saying 

2 

(6) (m,R) = argmin — MDL(m,R), 

m<M,R n 

and fi is given by (5). Practical algorithms, developed, for example, by Lee 
(2000) and Zhu and Yuille (1996), can be used to solve (6). 

One can also use AIC and BIC to derive penalty terms to add to RSS m , 
and the resulting penalties will be proportional to the number of "free" 
(and independent) parameters in the fitted image f [e.g., Murtagh, Raftery 
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and Starck (2005), Stanford and Raftery (2002) and Zhang and Modestino 
(1990)]. This leads to the following question: what would be a meaningful 
way of counting the number of free parameters in f? There seems to be 
no unique answer, but we shall follow Murtagh, Raftery and Starck (2005) 
and Stanford and Raftery (2002) and model each true pixel value fi with 
a mixture distribution of m Gaussians, where the mean, variance and mixing 
probability for the z^th Gaussian are /j, v , a 2 and a u /^2 u a u , respectively. As 
there are m of the /i^'s, one a 2 and m — 1 free mixing probabilities, the 
total number of free parameters is 2m. With this, the corresponding AIC 
and BIC segmentation criteria are 

AIC(m,R) = 2m + -ln - 



2 \ n 
and 

B1L(?ti,Rj = mmn -\ — ml - 



2 V n 
respectively. The AIC and BIC estimates for (m, R) are then given by 

2 

(7) (m,R) = argmin — AIC (m,R) 

m<M,R n 

and 

2 

(8) (m,R) = argmin — BIC(m,R), 

m<A/,R n 

respectively. Observe that for both AIC(m,R) and BIC(m,R), the region 
boundaries R are not explicitly penalized; they enter the criteria only through 
RSS m . Also observe that the penalty term of AIC(m, R) is independent of n. 
Before we proceed further, it is worthwhile to point out a major difference 
between the variable selection problem in linear regression models and the 
image segmentation problem. In variable selection for linear regression, the 
goal is to select the significant predictors and remove the insignificant ones 
from the model. In other words, some "data" are not used in estimating 
the model parameters. For image segmentation, the goal is to group ho- 
mogeneous pixels together to form segmented objects, and in this process 
all data (i.e., all pixel values) are always used to estimate the model pa- 
rameters. Given this major difference, one can see that variable selection in 
linear regression and image segmentation are two different problems, and 
hence existing theories from classical linear regression modeling cannot be 
directly applied to image segmentation. 

3. Main results. This section presents our main theoretical findings. 
Briefly, both the BIC and MDL segmentation solutions are statistically con- 
sistent in a well-defined sense, while the AIC solution is not. 



CONSISTENT IMAGE SEGMENTATION 5 

The consistency of the BIC and MDL solutions are investigated at two 
levels. First, we will establish the strong consistency of R if the true number 
of regions m = m° can be assumed known. Second, if the true value m° is 
unknown and if the noise is restricted to be Gaussian, we will establish the 
weak consistency of rh and R. While the existence of a true underlying 
model was not essential for the practical use of (6)-(8), we will, in this 
section, assume that the image of interest is indeed of the form (l)-(2) 
and shall denote the associated true gray values and segmentation by (jP = 
(/i°, . . . , fjP m0 ) and K° = (R 1 ,...,R° m0 ), respectively. 

In order to enable large sample results, we impose further technical condi- 
tions. First, to ensure sufficient separation of the regions and to avoid sets of 
zero (Lebesgue) measure in the decomposition of [0, l] 2 , it will be assumed 
throughout that each R® contains an open ball of suitably small radius: for 
all v = 1 , . . . , m° , there is z v £ R® and e > such that 

B e (z u ) = {z € [0, l] 2 : ||z - z u \\ < e} C R Q V 
with || • || denoting Euclidean norm on R 2 . All candidate segmentations R 
from which the estimate R is produced in any of (6) to (8) are restricted to 
satisfy the same condition. 

Next, we assume that the set of grid points H n is dense in [0, l] 2 in the 
sense that, for all e > 0, there is an no > 1 such that 

n 

(9) [0,1] 2 C \jB e {xi) forall?i>n . 

Last, we assume further that the number of grid points in any given region 
grows with the sample size (at the same linear rate) and therefore require 
that a v = [na u \ with J2 u a v = 1> where |_-J denotes the integer part. 

3.1. Consistency of MDL segmentation. We first consider the MDL seg- 
mentation solution (6). Suppose for now that m = m is known, and let 
R = argminR, - MDL(?ti , R). In this case, we have the following strong con- 
sistency result. 

Theorem 3.1. Let {yi} be the sequence of random variables specified 
in (3), and assume that m = m° is known. Then 

R — > R with probability one as n — > oo. 

The almost sure convergence in the theorem is defined as follows. Denote 
by -< the lexicographical order in R 2 , that is, a = (01, 02) -< b = (61, 62) if and 
only if either a\ < b± or a\ = b\ and 02 < 62 . We assume throughout that any 
segmentation R = {R\, . . . , R m ) satisfies R\ -< • • • -< R m , where R v -< R K if 
and only if there is z v G R v such that z v -< z K for all z K G R K . For two sets A 
and B, let now AAB be their symmetric difference. Denote by A 2 the Lebes- 
gue measure in R 2 restricted to [0, l] 2 and set RAR° = \\™ =l R U AR®. Then, 
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we mean by R — > R° with probability one that P(limsup n {A 2 (RAR°) = 
0}) = 1. In other words, the Lebesgue measure of the random sets RAR° is 
zero in the limit with probability one. 

The proofs of Theorems 3.1 and 3.2 below can be found in the Appendix. 

Of course, in practice, the assumption that m° is known is unrealistic. 
Establishing consistency in the general case of unknown m° is, however, sub- 
stantially more difficult. Even in the simpler univariate change-point frame- 
works, where independent variables are grouped into segments of identical 
distributions, only special cases such as normal distributions and exponen- 
tial families have been thoroughly investigated; see, for example, Lee (1997) 
and Yao (1988). The reason for this is that sharp tail estimates for max- 
ima of certain squared Gaussian processes are needed which do not hold 
for distributions with thicker tails. See Lemma A. 6 below for more details. 
Nevertheless, if we assume the noise is normally distributed, we are able to 
establish the following consistency result. 

Theorem 3.2. Let {yi} be the sequence of random variables specified 
in (3) and assume that the {e.;} are normally distributed. Then 



and 



even if the true value m = m° is unknown. Here — > indicates converyence in 
probability. 

The second convergence in probability is defined as follows. Let now 
RAR° = {J™ =1 R®AR U , where m = min{?n,r7i }. Then, in analogy to the 

almost sure convergence above, we use the terminology R — > R° to mean 
that lim n P({A 2 (R°AR) =0}) = 1. In words, Theorem 3.2 asserts that, if 
the noise E{ is normal, the MDL method is capable of recovering the true 
number of regions as well as the region boundaries as the number of pixels 
in the image goes to infinity. 

3.2. Consistency of BIC segmentation. The results stated in Theorems 3.1 
and 3.2 also hold for the BIC solution given by (8). This statement can be 
proofed by modifying the proofs for Theorems 3.1 and 3.2. Details can be 
found in the Appendix. 

3.3. AIC segmentation is inconsistent. While being consistent in the 
special case of known m = mP, the AIC solution given by (7) is, however, in- 
consistent in the general case. The main reason is that its penalty term, m, 
is independent of the sample size n and does not properly adjust for the 
model complexity. Some details are provided in the Appendix. 



- P 

m — > 7n 


as n — )• oo 


R^R° 


OSK-7 OO, 


n ■ t 


P 
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4. Simulation results. Two sets of simulation experiments were con- 
ducted to empirically verify the theoretical results presented above. 

4.1. Experiment 1. Three test images / were used in the first simulation 
experiment, and they are displayed in the top row of Figure 1. Recall that the 
area and perimeter of each region appear explicitly in the MDL penalty (4), 
but not the AIC nor the BIC penalty. To assess the effects of having or not 
having such quantities as penalty, the three test images were constructed 
to have different region areas, perimeters and area-to-perimeter ratios. Test 
image 1 has seven square regions of two different sizes, with true gray values 
for some of the adjacent regions being very close. Test image 2 contains eight 
rectangular regions of same size, with true gray values increasing from the 
left to the right. Test image 3 contains four regions of different sizes and 
shapes. 

Noisy images were generated by adding Gaussian white noise with vari- 
ance a 2 to each of the test images. Three signal-to-noise ratios (snrs) were 
used: 1, 2 and 4, where snr is defined as Y / var(/)/u. Some typical noisy im- 
ages are also displayed in Figure 1. Note that for snr = 1 some of the region 
boundaries are hardly visible. Four image sizes were used: n = 64 2 , 128 2 , 256 2 
and 512 2 , and the number of repetitions for each configuration was 500. 

For each noisy image, the AIC, BIC and MDL segmentation solutions (6) 
to (8) were obtained using the merging algorithm in Lee (2000). To verify the 

result that rh — > mP (Theorem 3.2), the number of regions in each segmenta- 
tion solution was counted and the corresponding frequencies are tabulated 
in Tables 1 to 3. From these tables the following empirical conclusions can 
be made: 

• AIC had a strong tendency to over-estimate m°. 

• The performance of BIC improved as n increased, and occasionally it 
over-estimated m°. 

• For reasonably large snr and n, MDL always correctly estimated mP. 

• For small snr and n, MDL under-estimated m°. As mentioned before, for 
such cases some of the region boundaries are hardly visible (see Figure 1). 

• When comparing the BIC and MDL results, especially from Table 3, it 
seems that having the region area and perimeter in the penalty improved 
the performance. 

The other major theoretical result that we want to verify is that R con- 
verges to R° (Theorems 3.1 and 3.2). However, it is not as straightforward as 

verifying rh — > m° , as there is no universally agreed distance metric for mea- 
suring the distance between two image partitions R and R° [although some 
related work can be found in Baddeley (1992)]. To circumvent this issue, we 
use a somewhat stricter metric, the mean-squared-error (MSE), defined as 



test image 1 
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test image 2 



test image 3 








Fig. 1. The true test images used in the first numerical experiment (first row), and 
typical noisy images generated from snr — 1 (second row), 2 (third row) and 4 (last row). 
All images are plotted with size 256 x 256. 



MSE(/) = Y^!i=i{fi ~ fi) 2 - The reason we see MSE(/) as a stricter metric is 
that, given that m° is correctly estimated, it is extremely likely that R = R° 
when MSE(/) = 0, but not vice versa. 
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Table 1 

uencies of rh estimated from the noisy images generated from test image 1 for 
different combinations of snr and n. The value of the true m° is 7 







n = 64 


i 2 


n = 128 2 


n = 256 2 


n = 512 2 


snr rh 


AIC 


BIC 


MDL 


AIC 


BIC 


MDL 


AIC 


BIC 


MDL 


AIC 


BIC 


MDL 
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495 


500 
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499 


500 





500 


500 


8 


18 


15 





10 


5 





15 


1 





14 
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59 








59 








52 








45 








10+ 


423 








428 








427 








441 
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489 


500 
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496 


500 
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499 


500 
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500 


500 
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22 


11 





25 


4 





24 


1 





16 
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63 








79 








65 








52 








10+ 


413 








394 








409 








431 
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487 


500 





498 


500 


3 


499 


500 





500 


500 


8 


19 


12 





17 


2 





9 


1 





1 








9 


64 








54 








31 








10 








10+ 


414 


1 





429 








457 








489 









The averaged values of MSE(/) and {MSE(/)} a5 /V are listed in Table 4, 
where a 2 is the true noise variance. As expected, the larger the image size n, 
the smaller these values are. Also, the corresponding figures from BIC and 
MDL are substantially smaller than those from AIC for large n. For small n 
and snr, MDL produced poor MSE(/) values. It is due to the fact that MDL 
under-estimates m°. 

4.2. Experiment 2. Altogether six test images were used in this second 
numerical experiment. When comparing to the three test images used in 
the first experiments, the shapes of the objects in these six images are more 
complicated; see Figure 2. 

We repeated the same testing procedure as above, but only for n = 256 2 . 
For each test image, the averages of the estimated number of regions for AIC, 
BIC and MDL segmentation solutions are tabulated in Table 5. The stan- 
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Table 2 
Similar to Table 1 but for test image 2. The value of the true m° is 8 







n = 64 


2 


n = 128 2 


n = 256 2 


n = 512 2 


snr rh 


AIC 


BIC 


MDL 


AIC 


BIC 


MDL 


AIC 


BIC 


MDL 


AIC 


BIC 


MDL 


1 3 








213 





























4 








276 








124 




















5 





1 


11 








312 
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23 











57 




















7 





127 











7 








2 











8 


5 


203 





78 


492 





69 


500 


498 


75 


500 


500 


9 


33 


114 





114 


6 





127 








96 








10+ 


462 


32 





308 


2 





304 








329 
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126 


82 


500 


500 


92 


500 


500 


66 


500 


500 


9 


119 


12 





114 








95 
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10+ 


296 








304 








313 








340 
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67 


499 


500 


76 


500 


500 


84 


500 


500 


65 


500 


500 


9 


96 


1 





126 








102 








115 








10+ 


337 








298 








314 








320 









dard errors of these averages are also reported. We have also computed the 
averaged values of of MSE(/) and {MSE(/)}°- 5 /V; they are listed in Table 6. 
Empirical conclusions obtainable from these two tables are similar to those 
from the first experiment. A noteworthy observation is that, when snr is not 
large, the tendency for BIC to over-estimate m° is more apparent for these 
new test images, that is, when the object boundaries are more complex. 

5. Real image segmentation. Figure 3(a) displays a synthetic aperture 
radar (SAR) image of a rural area. It is of dimension 250 x 250 and is made 
available by Dr. E. Attema of the European Space Research and Technology 
Centre. The image has been log-transformed in order to stabilize the noise 
variance. It would be useful to segment the image into regions of similar 
vegetation. 

Notice that the image is extremely noisy (i.e., low snr) and hence difficult 
to obtain a good segmentation. Therefore, we applied the MDL criterion to 
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Table 3 
Similar to Table 1 but for test image 3. The value of the true m° is 4 







n = 64 


i 2 


n = 128 2 


n = 256 2 


n = 512 2 


snr rh 


AIC 


BIC 


MDL 


AIC 


BIC 


MDL 


AIC 


BIC 


MDL 


AIC 


BIC 


MDL 


1 3 








2 





























4 


9 


493 


498 


4 


498 


500 


6 


499 


500 


8 


500 


500 


5 


34 


6 





33 


2 





3.") 


1 





22 








6 


63 


1 





60 








70 








57 








7 


94 








97 








86 








98 








8 


80 








113 








103 








95 








9 


99 








96 








87 








88 








10+ 


121 








97 








113 








132 








2 3 
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6 


494 


500 
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498 


500 


5 


499 


500 


3 


500 


500 


5 


29 


6 





22 


2 





28 


1 





24 








6 


69 








70 








58 








71 








7 


92 








97 








87 








85 








8 


102 








92 








124 








85 








9 


78 








91 








87 








80 








10+ 


124 








121 








111 








152 








4 3 
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500 
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499 


500 
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500 


500 
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500 


500 
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29 


8 





24 
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12 








4 








6 


56 








49 








46 








15 








7 


82 








87 








62 








31 








8 


102 








104 








101 








44 








9 


104 








94 








92 








76 








10+ 


123 








137 








184 








328 









segment the image, as the simulation results above suggest that both AIC 
and BIC would heavily oversegment the image. The MDL segmented result, 
which consists of 34 segmented regions, is given in Figure 3(b). 

Even though a Gaussian noise assumption may not be appropriate for 
this SAR image, the MDL criterion produced a reasonable segmentation. 
The most apparent weakness of the segmentation is the roughness of the 
boundaries (many of which should clearly be straight) and the failure to 
detect some narrow regions. This weakness can be (at least partially) at- 
tributed to the noisy nature of the image. 

6. Concluding remarks. This paper fills an important gap in the im- 
age segmentation literature by providing a systematic investigation into the 
theoretical properties of some popular information theoretic segmentation 
methods. It is shown that both the BIC and the MDL segmentation solutions 
are statistically consistent for recovering the number of objects together with 
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Table 4 

The averaged MSE(f) values (multiplied by 1,000) for each combination of test image, 

snr and n for the first simulation experiment. Numbers in parentheses are the ratios 

{MSE(f)} ' /a. Boldface indicates the smallest value for each experimental setting 



Image 


snr 




n = 64 2 


n= 128 2 


n = 256 2 


n = 512 2 


1 


1 


AIC 


18.58 (0.09352) 


4.510 (0.04607) 


1.090 (0.02265) 


0.2756 (0.01139) 






BIC 


6.193 (0.05399) 


0.9575 (0.02123) 


0.2244 (0.01028) 


0.05753 (0.005203) 






MDL 


31.14 (0.1211) 


0.9304 (0.02092) 


0.2230 (0.01024) 


0.05753 (0.005203) 


1 


2 


AIC 


4.196 (0.08887) 


1.050 (0.04447) 


0.2689 (0.02250) 


0.06735 (0.01126) 






BIC 


0.9305 (0.04185) 


0.2291 (0.02076) 


0.05672 (0.01033) 


0.01441 (0.005208) 






MDL 


0.8783 (0.04066) 


0.2236 (0.02052) 


0.05630 (0.01029) 


0.01441 (0.005208) 


1 


4 


AIC 


1.076 (0.09002) 


0.2736 (0.04539) 


0.06671 (0.02241) 


0.01682 (0.01125) 






BIC 


0.2472 (0.04314) 


0.05934 (0.02114) 


0.01424 (0.01035) 


0.003550 (0.005170) 






MDL 


0.2280 (0.04144) 


0.05869 (0.02102) 


0.01414 (0.01032) 


0.003550 (0.005170) 


2 


1 


AIC 


76.23 (0.1894) 


6.908 (0.05701) 


1.661 (0.02796) 


0.4176 (0.01402) 






BIC 


112.8 (0.2304) 


3.038 (0.03781) 


0.6388 (0.01734) 


0.1617 (0.008724) 






MDL 


472.8 (0.4717) 


218.2 (0.3204) 


0.8846 (0.02040) 


0.1617 (0.008724) 


2 


2 


AIC 


6.726 (0.1125) 


1.655 (0.05581) 


0.4015 (0.02749) 


0.1047 (0.01404) 






BIC 


3.212 (0.07775) 


0.6411 (0.03474) 


0.1540 (0.01702) 


0.04023 (0.008702) 






MDL 


90.07 (0.4118) 


0.6411 (0.03474) 


0.1540 (0.01702) 


0.04023 (0.008702) 


2 


4 


AIC 


1.697 (0.1130) 


0.4143 (0.05585) 


0.1027 (0.02780) 


0.02516 (0.01376) 






BIC 


0.6276 (0.06874) 


0.1552 (0.03419) 


0.03902 (0.01714) 


O.OIOOO (0.008678) 






MDL 


0.6248 (0.06859) 


0.1552 (0.03419) 


0.03902 (0.01714) 


O.OIOOO (0.008678) 


3 


1 


AIC 


11.88 (0.07476) 


2.870 (0.03675) 


0.7030 (0.01819) 


0.1759 (0.009098) 






BIC 


2.078 (0.03127) 


0.4024 (0.01376) 


0.09679 (0.006749) 


0.02367 (0.003338) 






MDL 


2.545 (0.03461) 


0.3927 (0.01359) 


0.09558 (0.006707) 


0.02367 (0.003338) 


3 


2 


AIC 


2.932 (0.07429) 


0.7225 (0.03688) 


0.1822 (0.01852) 


0.04568 (0.009272) 






BIC 


0.4140 (0.02792) 


0.1053 (0.01408) 


0.02521 (0.006889) 


0.006404 (0.003472) 






MDL 


0.3915 (0.02715) 


0.1028 (0.01391) 


0.02494 (0.006852) 


0.006404 (0.003472) 


3 


4 


AIC 


0.7430 (0.07479) 


0.1839 (0.03721) 


0.04468 (0.01834) 


0.01101 (0.009106) 






BIC 


0.1106 (0.02885) 


0.02441 (0.01356) 


0.005919 (0.006676) 


0.001478 (0.003336) 






MDL 


0.1041 (0.02799) 


0.02415 (0.01348) 


0.005919 (0.006676) 


0.001478 (0.003336) 



their boundaries in an image. These theoretical results are empirically ver- 
ified by simulation experiments. We also note that our theoretical results 
can be straightforwardly extended to higher-dimensional problems, such as 
volumetric or movie segmentation. 

The numerical results from the simulation experiments also revealed some 
discrepancy in the finite sample performances between BIC and MDL, which 
can be attributed to the fact that the region area and perimeter enter ex- 
plicitly into the MDL segmentation criterion but not BIC. These results 
seem to suggest that, when both the number of pixels n and the signal-to- 
noise ratio (snr) are not small, MDL is capable of producing very stable 
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Fig. 2. The true test images used in the second numerical experiment. 

and reliable results. For those cases when both n and snr are small, MDL 
always under-estimated the number of regions, which led to poor MSE val- 
ues. However, when one inspects the noisy images that correspond to such 
cases, one can see that, due to the high noise variance, some of the adjacent 
regions are hardly distinguishable, which explains the under-estimation of 
MDL. Overall the numerical results also suggest that BIC has a tendency 
to over-estimate the number of regions, and for those high noise variance 
cases, this tendency actually worked in favor of the situation. Considering 
all these factors, in practice if the image to be segmented is not too noisy 
or not too small in size, one may consider using MDL, otherwise, use BIC. 

APPENDIX: PROOFS 

This Appendix first provides the proofs for Theorems 3.1 and 3.2 in Ap- 
pendices A.l and A. 2. Appendix A. 3 covers the BIC and AIC procedures. 

A.l. Proof of Theorem 3.1. We first provide a number of auxiliary re- 
sults and will throughout use the following conventions. The true segmenta- 
tion of [0, l] 2 will be denoted by R®,. . . ,R° . All other segmentations will be 

denoted R\, . . . , R m , while the MDL-based estimates will be R±, . . . , R m . Re- 
call that in the situation of Theorem 3.1, the number of segments, m = m°, 
is assumed known. 
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Table 5 

The averaged m values for the second numerical experiment. Numbers in parentheses are 

estimated standard errors. The true values of m (i.e., m° ) are listed in square brackets 



Image [m°] 




snr = 1 


snr = 2 


snr = 4 


Disc [8] 


AIC 
BIC 

MDL 


83.2 (0.274) 
20.9 (0.165) 
6.38 (0.0219) 


69.0 (0.268) 
16.5 (0.123) 
7.06 (0.0107) 


48.2 (0.243) 
9.94 (0.0689) 
8.05 (0.014) 


Hand [8] 


AIC 
BIC 

MDL 


77.8 (0.259) 
20.4 (0.139) 
6.84 (0.0259) 


63.7 (0.247) 
15.5 (0.106) 
8.05 (0.0245) 


39.6 (0.219) 
9.45 (0.0636) 
8.13 (0.0168) 


Human-body [6] 


AIC 
BIC 

MDL 


67.7 (0.268) 
15.7 (0.130) 
5.04 (0.00964) 


47.9 (0.247) 
8.97 (0.0951) 
6.23 (0.0253) 


25.3 (0.187) 
6.15 (0.0194) 
6.03 (0.00739) 


Ring [16] 


AIC 
BIC 

MDL 


81.1 (0.266) 
24.8 (0.153) 

11.2 (0.0279) 


69.6 (0.244) 
22.1 (0.120) 
13.9 (0.0184) 


48.9 (0.218) 
16.7 (0.0613) 
15.2 (0.0189) 


Sunflower [8] 


AIC 
BIC 

MDL 


81.8 (0.289) 
20.0 (0.153) 
6.07 (0.0117) 


67.6 (0.250) 

15.7 (0.123) 
7.41 (0.0222) 


47.8 (0.259) 
10.2 (0.0939) 
8.15 (0.0224) 


Triangle [8] 


AIC 
BIC 

MDL 


75.7 (0.276) 
18.6 (0.138) 
6.97 (0.0101) 


62.6 (0.248) 
14.5 (0.119) 
7.57 (0.0223) 


35.4 (0.220) 
8.48 (0.0313) 
7.99 (0.00597) 



Lemma A.l. Let yi = f(xi) + e%, i = 1, . . . ,n, be random variables with 
f(x) = /i for all x € [0, l] 2 and design points E n = {xi, . . . , x n } C [0, l] 2 satis- 
fying (9). Assume furthermore that {e{\ is a sequence of independent, iden- 
tically distributed random variables with zero mean and variance a 2 . Fix 
a subset R C [0, l] 2 , and let a = j^A for A = {i:xi 6S„fl R}. Define the 
estimators 



IV 



ieA 

■2 



y, 



Then fi{R) — > fi and cr 2 (R) 



and a 2 (R) = -Y / {y i -(i(R)¥ 



ieA 



a 2 with probability one as n 



oo. 



Proof. Notice that the sequence {y^} is globally independent and iden- 
tically distributed with mean /i and variance a 2 , so in particular on any sub- 
set R C [0, l] 2 . Both assertions of the lemma follow therefore directly from 
the strong law of large numbers after recognizing that a — > oo as n->oo 
because of (9). □ 

Lemma A. 2. Let {yi} be the sequence of random variables defined in (3). 
Fix a subset R C [0, l] 2 and denote by fi(R) the sample mean defined in 
Lemma A.l. Then, fi{R) — > /i* with probability one, where the limit /i*(R) 
is defined in (10) below. 
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Table 6 

The averaged MSE(f) values (multiplied by 1,000) for each combination of test image 

and snr. Numbers in parentheses are the ratios {MSE(f)} 0,5 /a. Boldface indicates the 

smallest value for each experimental setting 



Image 




snr = 1 


snr 


= 2 


snr = 4 


Disc 


AIC 


475.4 


0.2333) 


81.55 


0.1932) 


10.44 (0.1383) 




BIC 


405.7 


0.2155) 


65.78 


0.1735) 


7.811 (0.1196) 




MDL 


428.7 


0.2215) 


69.56 


0.1784) 


7.763 (0.1192) 


Hand 


AIC 


504.9 


0.2950) 


79.51 


0.2342) 


10.75 (0.1722) 




BIC 


465.3 


0.2832) 


70.62 


0.2207) 


9.522 (0.1621) 




MDL 


485.4 


0.2893) 


71.22 


0.2216) 


9.853 (0.1649) 


Human-body 


AIC 


135.3 


0.2443) 


19.82 


0.1870) 


1.491 (0.1026) 




BIC 


119.9 


0.2300) 


17.17 


0.1741) 


1.208 (0.09234) 




MDL 


120.9 


0.2309) 


17.53 


0.1759) 


1.217 (0.09269) 


Ring 


AIC 


541.1 


0.2774) 


81.89 


0.2158) 


11.00 (0.1582) 




BIC 


493.2 


0.2648) 


70.89 


0.2008) 


9.314 (0.1456) 




MDL 


520.8 


0.2721) 


73.74 


0.2048) 


9.572 (0.1476) 


Sunflower 


AIC 


527.3 


0.2517) 


89.43 


0.2073) 


12.98 (0.1580) 




BIC 


464.1 


0.2362) 


74.97 


0.1898) 


10.54 (0.1423) 




MDL 


488.3 


0.2422) 


83.32 


0.2001) 


10.74 (0.1437) 


Triangle 


AIC 


219.8 


0.2165) 


32.64 


0.1668) 


3.242 (0.1051) 




BIC 


182.6 


0.1973) 


24.84 


0.1455) 


2.326 (0.08906) 




MDL 


168.9 


0.1897) 


23.50 


0.1416) 


2.353 (0.08957) 



Proof. Utilizing the true segmentation, we can write 

m 2 

R=\jRr)R u = \J (J Rn Rl, 



v=\ 



=1 ueie 



where X\ = {v. R® C R} and T<i = {v.RnR® ^ 0}\Zi, thus ignoring those v 
for which R D R® = on the right-hand side of the last display. Define a° = 
#Al for A° u = {i:xi eE n C\RnR°} and a° u = #A° U for A% = {i:xi e H n ni?°}. 
It follows from an application of Lemma A.l that 



(io) =^(e« + e«) 



Veil 



vex 2 
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Fig. 3. Real image segmentation, (a); Observed SAR image and (b): MDL segmented 
result. 

with probability one as n — > oo, on account of (9) and by assumption on the 
representation of the number of design points in any given region (a = \_an\ , 



Lemma A. 3. Let {y{\ be the sequence of random variables defined in (3). 
Fix a subset R C [0, l] 2 and denote by cr 2 (R) the variance estimator defined 
in Lemma A.l. Then, cr 2 (R) — > a 2 + u 2 (i?) with probability one, where o~ 2 (R) 
is defined in (11) below. 

Proof. Using the notation of the proof of Lemma A. 2 and applying 
similar arguments yields the decomposition 

ieA veli ieA^ v&Xi i & A° 

Let first v € X\. By definition of I\, R® is completely contained in R. There- 
fore, adding and subtracting the true value /i° from each of the terms 
t/i — ft(R) and subsequently solving the square leads to 



\ ]T { Vi - m)? = \ E (w - A? - ~ E (w - AM - MR)} 

ieA° it 



ieA° 



i&Al 

i2 



ieA° 

S! + S 2 + s 3 . 
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Lemma A.l implies for the first term that 

a 1 y^ / 0\2 . a v 2 I , \ 

Si = jr }XVi-\i v ) ->• — cr a.s. (ra-^oo). 

The second term S^ is asymptotically small with probability one. To see this, 
observe that, by Lemma A. 2, /j,® — £i(R) converges a.s. to M° = /i° — n*(R) 
as n — >• oo. For two sequences {£ n } and {£ n } of real numbers, write £ n ~ Cn 
if lim^^" 1 = 1. Then, using the strong law of large numbers for the i.i.d. 
sequence {£i}, we obtain that 

_ 2M° ^ 2M° ^ 

5 2 -V(y<-|u£) = -> £;^0 a.s. (n-)-oo). 

Finally, by Lemma A. 2, 

S3 = ^{fi° - KR)} 2 -»• — {/£ - ^(^R)} 2 a.s. (n -)• oo). 
a a 

Let now v £ X2. Then the region R® of the true segmentation is only partially 

contained in R. This means that, while all computations can be performed 

along the blueprint for the case v G X±, 5°, a° and A® have to be used 

in place of their respective counterparts a°, a° and A®. Combining these 

results, we arrive at the almost sure convergence 

- 2 (*)^(E«° + E a ° 



(11) 



J2 «°K - /^)} 2 + E «M - ^™ 2 

2 1 Jii 



<{R) 

since ^j a° + Yli 2 ®v = a - This proves the assertion. □ 

Lemma A. 4. Let {y{\ be the sequence of random variables defined in (3). 
Let e > such that, for appropriately chosen z„ € R v in a segmentation 

R= \R\i- • ■ ,Rm), 

(12) B e (7. u )<zR u for all u = l,. . . ,m = m°. 

Let lZ e = {R: \\ u R u satisfying (12) such that a u = \na v \ , Y^ u a u = 1}. Then 

2 
R = argmin— MDL(m°,R) — >• R° a.s. (n— ►oo), 
ReK e n 

where R° denotes the true segmentation of [0, l] 2 . 

Proof. Assume that the MDL estimator is not strongly consistent. 
Thus R does not converge with probability one to R° as n — ► oo. By bound- 
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edness, there exists a monotonically increasing subsequence {rij} along which 
R n . — > R* with probability one, with the limit R* being a member of 1Z e , 
and A 2 (R*AR°) > with probability one. Note that we must have also 
that a v — > a* v along the same subsequence. Note that, with probability 
one, ^MDL(m°,R) ~ log(^RSS m o), where ~ is defined in the proof of 
Lemma A. 3, and that, for R = R*, 

1 m ° 
-RSS m o = -^^{ yi -/i(K)} 2 

adopting notation from before. For any u, there are now two options: ei- 
ther R* is contained in a region of the true segmentation, or R* v has non- 
trivial intersections with more than one region of the true segmentation. In 
the first case, R* C R° K for some n. Hence, Lemma A.l implies that 

~ y~! iVi ~ KK)} 2 -> a l° 2 a - s - (n-tao). 
n ^-^ 

In the second case, R* = \J K R„ n i?°, where the disjoint union contains at 
least two elements. Then, Lemma A. 3 yields that 

~~ y~! {yi ~ KK)} 2 ->■ K^ 2 + ° 2 a - s - ( n ->■ °°)> 

where a\ = ^ u Ci* u a1{R* u ) with a*(R*) as in Lemma A. 3. Observe that, on 
account of R* ^ R° [in the sense that A 2 (R*AR°) / almost surely], we 
have a -2 > 0. On the other hand, er 2 = if the true segmentation R° were 
used. Consequently, exploiting the continuity and strict concavity of the 
logarithm, we arrive at 






lim -MDL(?n°,R*)> Va° logo- 2 = logo- 2 = lim -MDL(m°,R 

2 



o-\ 



n— >oo n L — ' n-4-oo n 



> lim -MDL(?n°,R*), 
which is a contradiction. Hence, R is strongly consistent for R°. □ 
A.2. Proof of Theorem 3.2. 

Lemma A. 5. Let {yi} be the sequence of random variables defined in (3). 



If 



2 
(m,R)= argmin — MDL(to,R) 

m<MR€7£ e n 



then P(fh > m°) — > 1 as n 



oo. 
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Proof. Notice that it follows from the proof of Lemma A. 4 that 
— RSS m o — > a 2 with probability one, provided the true segmentation R° is 

used in the computations. If rh < m°, then there is at least one R u contain- 
ing two or more true regions R^.. It follows as in the proofs of Lemmas A. 3 
and A. 4 that P(^RSS m > a 2 + e) — > 1 as n — >• oo for a suitably chosen e > 0. 
This implies the claim. □ 

Lemma A. 6. Let {yi} be the sequence of random variables defined in (3). 
If m° < m < M, then, for all v = 1, . . . , mP, 

P{ReC°(n)}^0 (n-Kx), 

where C°(n) = {R=(R 1 ,...,R m y.dR K <£dR u + B iin) (0),K = l,...,m}. 

Proof. Fix 1 < u < m°, and let R £ C®(n). Because of the continuity 
of dR®, there is az„ € dR® such that dR K r\B£^(z u ) = for all k = 1, . . . ,m. 

Define R as the segmentation that includes all regions of the form 
R K n R° u , n B c i{n) (z u ), K=l,...,m;u' = l,...,m°, 

and B£f n \(z v ). Clearly, RSS(R) > RSS(R), where we use the notations 

RSS(R) and RSS(R) for the residual sums of squares based on the respec- 
tive segmentations R and R. Decomposing according to the true segmenta- 
tion R° leads to comparisons of the following types. Consider first the case 
R^, n B£/ n \(z u ) = 0. Then, it follows as in Lemma 4 of Yao (1988) that 

°^ E ^~ E E{2/i-Afe} 2 = C»p(mn) (n-»oo), 

where l u > = {k: R k C R®,}, A q v , = {i: x; € S n (1 R° u ,} and A K = {i: X; G E n n 
R K }. The rate on the right-hand side of the last display explicitly uses that 
the noise {ej} follows a normal law and does not need to be true for arbi- 
trary noise distributions [compare the remark on page 188 of Yao (1988)]. 
Consider next the case R®, n B^ n )(z„) ^ 0. Observe that the number of de- 
sign points in B£/ n \(z u ) is proportional to In n, while the number of design 
points in any R u is proportional to the sample size n. Any region R v € R 
obtained from a nontrivial intersection with B^,Az v ) has therefore the num- 
ber of elements reduced by a factor proportional to In n. This, however, is 
negligible compared to n in the long run. Therefore, the same arguments as 
before imply also that 

°^ E e >" E ^2{Vi-KRn)} 2 = Op(hLn) (n^oo), 

where C°, = A° u , \ B v > with B v > = {i:xi eH n n B i{n) (z v ) n i?°,}, and J v , = 
{k: R k C R®,r\Bg,Jz u )}. It remains to investigate the region Bg^(z v ) itself. 



20 A. AUE AND T. C. M. LEE 

Without loss of generality assume that B£t n \(z v ) intersects, apart from R®, 
only one more true regions R®, as the general case can be handled in a similar 
fashion. Notice that b = #{-B^( n ) {z u ) D H n } = [f3n\ ~ In 2 n by definition. Let 
furthermore b u = #{E n n R° n ^( n )(z„)} and 6^/ = #{H n n i?°, n B i{jl) (z v )}. 
Then, we must have b v = \_fi v n\ ~ In n and b v i = \J5 v in\ ~ In n for appropri- 
ate /3 V and j3 u i satisfying f3 u + /3 u i = j3. Now, utilizing that y{ — p,(BH n \(z v )) = 
£i + (i v - K B e(n)( z v)) on Rl and y { - jX{B^ n) (z u )) = £i + /v - fi(B^ n) (z u )) 
on R®,, we obtain that 



1 
b 



= r[K{fi v - K B e(n)M)} 2 + K'ivv' - KB e{n) (z u ))} 2 ] + o(l) 



6 



[p v - ilylf = B 



a 2 

with probability one as n — > oo, where B* = {i: Xi € H n n l?£( n )(zj,)} and the 
limit is clearly negative. Combining the results in the last three displays, we 
arrive consequently at 

^{RSS-RSS(R)}4s<0, 

where RSS = £™ =1 e 2 . Thus, 

lim min RSS(R) > lim RSS > lim RSS(R) 

71— >ooR,g[cO(n)] c n— >oo n— >oo 

with probability approaching one. This implies the assertion. □ 

Lemma A. 7. Let {y,} be the sequence of random variables defined in (3). 
If m° < m < M and e > 0, then 

P{RSS - RSS(R) € [0, L n (e,R)]} ->■ 1 (n -»• oo), 

where RSS = X^iLi 6 ?' RSS(R) is the residual sum of squares based on the 
segmentation R = (R\, . . . , R m ) selected by the MDL criterion and L n (e, R) = 
<7 2 {e + 2(m-m°-l)(l+e)}lnn. 

Proof. It follows from Lemma A.6 that R G B°(ra) = lX=i[C°( n )] c 
with probability approaching one. It is therefore sufficient to verify the claim 
for an arbitrary segmentation R £ B°(n). Given such an R introduce the 
finer R as the segmentation containing the regions 

(13) R K nR u , n[B°(n)] c , K = l,...,m;i/ = l,...,m°, 
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and 

(14) R K C\Rl,r\Bl{n), K=l,...,m;i/,i/' = l,...,m . 

Denote the collection of regions (13) by Ri and the collection of regions (14) 
by R 2 . We then have RSS > RSS(R) > RSS(R) = RSS(Ri) + RSS(R 2 ). The 
number of design points in R 2 is, by definition of the sets C®(n), proportional 
to Inn. An application of Lemma 1 in Yao (1988) yields therefore that 



J2 5>'- RSS (^) 



Op(lnlnn) (n— >-oo). 



For fi„GRi, let a u = jfR v . Since R € C°(n), it holds that #Ri < m 



As in (17)— (19) of Yao (1988), we conclude therefore with Theorem 2 of Dar- 
ling and Erdds (1956) that, for any e > and with probability approaching 
one, 

J2 j>^ RSS ^i)^ E E e ?- L «( e ' R )- 

R„eRiieA u R u eRiieA u 

This completes the proof. D 

Lemma A. 8. Let {yi} be the sequence of random variables defined in (3). 
If m > m° , then using the notation of (4), it holds for the penalty terms 
arising from the area and the perimeter pieces that 

m mP m m° 

^lna K -^lna° >0 and ^6 K -^6°>0 

with probability approaching one as n—^oo. 

Proof. Lemma A. 6 implies that the oversegmentation R. m approxi- 
mates the true segmentation R° in the sense that, with probability approach- 
ing one, each perimeter dR® is uniformly approximated by one or more 
perimeters dR K . This yields in particular that, for a suitable u K = 1, . . . , m°, 
P(R K C R„ ) — > 1 for all k = 1, ... ,m. By assumption, we can write that 
o-k = ^k,,u0^ k with X KjV —7- a K /a® K as n — > oo. Let V u = {k!: R k i Hi?"/ 0}. 
Then, with probability approaching one, 



in 

a* 

K=l 



n 






II II KM)*""- 1 > (mina°)™- m ° ]J ]J \ K , V > 1 

v=1k€V v v=\k&) v 



since '*}2 lv ($Vv ~~ -0 = m ~ m °> a u = l a u n \ an d the product over the X K:L/ 
converges to a finite limit as n — > oo. This implies the first statement of 
the lemma. The second claim follows along similar lines from the fact that 
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the true segmentation "shares" all its perimeters with the oversegmentation 
with probability approaching one. Since m > m°, there must at least be one 
additional perimeter piece and the assertion follows. □ 

Lemma A. 9. Let {yi} be the sequence of random variables defined in (3). 
If m> mP , then 

A(m,m°) = ^!.ln(^^^ -ln(^^^\ + (m-m°)lnn>0 

with probability approaching one as n—^oo. 

Proof. Let e > 0. By the law of large numbers, we have that RSS = 
EILi £ i > n ( a2 ~ e )- Also > RSS ^ RSS m o. Hence, 

I 0\ n I I RSS m \ / RSS \ I . n. 

A(m,m ) > — < In — In > + (m — m )mn 

= 2 V RSS J +( m-m ^ lnn 

N K |, ^n(e,R) \ , om 

> — m< 1 —z > + (m — m ) mn, 

2 [ n[a 2 — e) J 

where the last inequality follows after an application of Lemma A. 7. Con- 
tinuing as in Yao (1988), using the fact that ln(l — x) > —x(l + e) for small 
positive x and the definition of L n (e,R), the right-hand side can be esti- 
mated from below by 

(15) -—T2 %e + 2(m-m°- 1)(1 + e)}lnn + (m - m°)lnn, 

2(<t — e) 

which is positive with probability approaching one whenever e is sufficiently 
small. □ 

This implies that m — > mP. The second claim of Theorem 3.2 follows from 
P(£ n ) > P{£ n ,rh = m°) -»• 1, where C n = {A 2 (R°AR) = 0}. 

A.3. Proofs for BIC and AIC segmentations. The counterparts of The- 
orem 3.1 for the AIC and BIC procedures are verbatim the same as for the 
MDL procedure. Consistency in the case of known m = m° does therefore 
not depend on the particular penalty terms. 

The situation is, however, very different in the general case of an unknown 
number of segments in the partition. Here, we can prove the consistency re- 
sult of Theorem 3.2 only for the BIC procedure. Following the lines of the 
proofs in Appendix A. 2, it can be seen that Lemmas A.5-A.7 deal only with 
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the RSS term and hold irrespective of the specific penalty term. Lemma A. 8 
deals with the complexity of areas and perimeters unique to the MDL cri- 
terion. The crucial point is therefore Lemma A. 9. Repeating the arguments 
in its proof, one can for the BIC criterion similarly verify that, if to > to , 

A(m,m°) = ^\n(^^] -ln(^^)\ + ( m -m )\nn>0 

with probability approaching one as n — > oo, utilizing 

-^2 — "Ti e + 2 ( m -m° - 1)(1 + e)} Inn + (m - to ) Inn 

instead of (15). This implies consistency of the BIC procedure. For the 
AIC segmentation, however, the second term in the last display becomes 
2(m — ?n ) which grows too slowly to ensure positivity. Hence AlC-based 
procedures are inconsistent if to is unknown. 

Acknowledgments. The authors are grateful to the reviewers and the 
Associate Editor for their most useful comments. 

REFERENCES 

Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. 
Automat. Control AC-19 716-723. System identification and time-series analysis. 
MR0423716 

Baddeley, A. J. (1992). Errors in binary images and an L p version of the Hausdorff 
metric. Nieuw Arch. Wisk. (4) 10 157-183. MR1218662 

Darling, D. A. and Erdos, P. (1956). A limit theorem for the maximum of normalized 
sums of independent random variables. Duke Math. J. 23 143-155. MR0074712 

Glasbey, C. A. and HORGAN, G. W. (1995). Image Analysis for the Biological Sciences. 
Wiley, Chichester, New York. 

Haralick, R. M. and Shapiro, L. G. (1992). Computer and Robot Vision. Addison- 
Wesley, Reading, MA. 

Kanungo, T., Dom, B., Niblack, W., Steele, D. and Sheinvald, J. (1995). MDL- 
based multi-band image segmentation using a fast region merging scheme. Technical 
Report RJ 9960 (87919), IBM Research Division. 

LaValle, S. M. and Hutchinson, S. A. (1995). A Bayesian segmentation methodology 
for parametric image models. IEEE Transactions on Pattern Analysis and Machine 
Intelligence 17 211-217. 

Leclerc, Y. G. (1989). Constructing simple stable descriptions for image partitioning. 
Int. J. Corn-put. Vis. 3 73-102. 

Lee, C.-B. (1997). Estimating the number of change points in exponential families dis- 
tributions. Scand. J. Stat. 24 201-210. MR1455867 

Lee, T. C. M. (1998). Segmenting images corrupted by correlated noise. IEEE Transac- 
tions on Pattern Analysis and Machine Intelligence 20 481-492. 

Lee, T. C. M. (2000). A minimum description length-based image segmentation pro- 
cedure, and its comparison with a cross-validation-based segmentation procedure. 
J. Amer. Statist. Assoc. 95 259-270. MR1803154 



24 A. AUE AND T. C. M. LEE 

Luo, Q. and Khoshgoftaar, T. M. (2006). Unsupervised multiscale color image seg- 
mentation based on MDL principle. IEEE Trans. Image Process. 15 2755-2761. 

Murtagh, F., Raftery, A. E. and Starck, J. L. (2005). Bayesian inference for multi- 
band image segmentation via model-based cluster trees. Image and Vision Computing 
23 587-596. 

RlSSANEN, J. (1989). Stochastic Complexity in Statistical Inquiry. World Scientific Series 
in Computer Science 15. World Scientific, Teaneck, NJ. MR1082556 

RlSSANEN, J. (2007). Information and Complexity in Statistical Modeling. Springer, New 
York. MR2287233 

Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. 
MR0468014 

Stanford, D. C. and Raftery, A. E. (2002). Approximate Bayes factors for image 
segmentation: The pseudolikelihood information criterion (PLIC). IEEE Transactions 
on Pattern Analysis and Machine Intelligence 24 1517-1520. 

Wang, J., Ju, L. and Wang, X. (2009). An edge-weighted centroidal Voronoi tessellation 
model for image segmentation. IEEE Trans. Image Process. 18 1844-1858. MR2750696 

Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz' criterion. Statist. 
Probab. Lett. 6 181-189. MR0919373 

Zhang, J. and Modestino, J. W. (1990). A model-fitting approach to cluster validation 
with application to stochastic model-based image segmentation. IEEE Transactions on 
Pattern Analysis and Machine Intelligence 12 1009-1017. 

Zhu, S. C. and Yuille, A. (1996). Region competition: Unifying snakes, region growing, 
and Bayes/MDL for multiband image segmentation. IEEE Transactions on Pattern 
Analysis and Machine Intelligence 18 884-900. 

Department of Statistics 
University of California at Davis 
4118 Mathematical Sciences Building 
One Shields Avenue 
Davis, California 95616 
USA 

E-MAIL: alexauc@wald.ucdavis.edu 
tcmlee@ucdavis.edu 



